refactor: remove allow_resize support#766
Conversation
Remove the allow_resize column config field and the sync resize fallback paths now that row-count changes belong at workflow boundaries. Enforce size-preserving batch replacement and pre-batch processors, update custom column validation, and revise docs/tests for workflow chaining migration.
|
Fern preview: https://nvidia-preview-pr-766.docs.buildwithfern.com/nemo/datadesigner
|
Greptile SummaryThis PR completes the removal of the deprecated
|
| Filename | Overview |
|---|---|
| packages/data-designer-config/src/data_designer/config/base.py | Removes allow_resize field from SingleColumnConfig and the validator that prevented it from being combined with skip. |
| packages/data-designer-config/src/data_designer/config/fingerprint.py | Bumps CONFIG_HASH_VERSION 1→2, invalidating all pre-removal checkpoints and preventing unsafe resume from allow_resize-era runs. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py | Removes _resolve_async_compatibility, _has_allow_resize_columns, _cell_resize_mode, and all resize-aware branches; both build and build_preview now use DATA_DESIGNER_ASYNC_ENGINE unconditionally. Also fixes config-read error handling to return INCOMPATIBLE instead of COMPATIBLE. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/dataset_batch_manager.py | replace_buffer loses allow_resize param; row-count equality is now unconditionally enforced and the _num_records_list resize-bookkeeping path is removed. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/processor_runner.py | PRE_BATCH row-count changes now always raise inside _run_stage; run_pre_batch_on_df drops the strict_row_count knob and relies on the shared enforcement. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/skip_tracker.py | restore_skip_metadata loses the allow_resize parameter; the 1:1 row-identity check is now always enforced (previously conditional on allow_resize=False). |
| packages/data-designer/src/data_designer/interface/data_designer.py | _resolve_client_concurrency_mode drops the allow_resize sync-fallback branch; ClientConcurrencyMode now resolves purely from DATA_DESIGNER_ASYNC_ENGINE. |
| packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py | All allow_resize integration tests removed; new test_pre_batch_processor_row_count_change_rejected and test_build_resume_always_raises_on_unreadable_stored_config cover the new invariants. |
Reviews (3): Last reviewed commit: "fix: address allow_resize review feedbac..." | Re-trigger Greptile
|
Thanks for putting this together, @andreatgretel — this is a clean, thorough teardown of SummaryThis PR removes the deprecated FindingsSuggestions — Take it or leave it
What Looks Good
VerdictShip it (with nits) — This is a tidy, self-consistent removal with matching test and doc updates and no dangling references. The two suggestions are optional polish; nothing blocks merge. One process note for the author (not a code issue): the PR checklist shows This review was generated by an AI assistant. |
|
Nice work on this cleanup — the core removal is nicely concentrated and the public docs are much clearer about using workflow boundaries. SummaryThis branch removes FindingsWarnings — Worth addressing
Suggestions — Take it or leave it
What Looks Good
VerdictNeeds changes — Please address the processor docs mismatch and the residual tracked plan references. The This review was generated by an AI assistant. |
johnnygreco
left a comment
There was a problem hiding this comment.
Requesting changes based on the review feedback already posted: please address the processor docs mismatch and the residual tracked plan references for the removed allow_resize behavior before merge.
|
Thanks for putting this together, @andreatgretel — the surface-level removal is tidy and the docs/test updates land in the right places. SummaryThis PR deletes FindingsI'm scoping this review to issues not already raised by the existing greptile / @johnnygreco reviews. Those reviews already cover:
Suggestions — Take it or leave it
What Looks Good
VerdictShip it (with nits) once @johnnygreco's CHANGES_REQUESTED items land — the residual One process note: the PR checklist still shows This review was generated by an AI assistant. |
|
Following up on my earlier review — circling back on resume specifically. I think there's one Warning-tier concern that the existing reviews haven't surfaced. Warning — Worth addressing
This follow-up was generated by an AI assistant. |
|
@johnnygreco @nabinchha thanks for the detailed review. I pushed
Greptile re-reviewed the latest commit and is green. Full CI is passing. |
johnnygreco
left a comment
There was a problem hiding this comment.
Thanks for the follow-up, @andreatgretel. I re-checked the requested changes against dd7e135: the processor docs now state both generation-time processor stages are row-count invariant, the tracked plan references are clearly marked historical/completed, and the stale custom generator return annotations are narrowed. I also verified the follow-up resume/fingerprint fixes and the added tests. My requested changes are resolved.
📋 Summary
Remove the deprecated
allow_resizecolumn config field now that row-count changes are handled at workflow boundaries. This also removes the sync resize fallback paths and updates docs/tests to point users toward workflow chaining for expansion and filtering.🔗 Related Issue
Part of #552
🔄 Changes
allow_resizefromSingleColumnConfig; configs that pass it now fail at the Pydantic boundary as an extra field.DatasetBuilder,DatasetBatchManager, skip metadata restore, and custom cell-by-cell generator validation.DATA_DESIGNER_ASYNC_ENGINE=0opt-out.🔍 Attention Areas
base.py- public config schema no longer acceptsallow_resize.dataset_builder.py- removes the sync resize fallback and resize-specific builder branches.🧪 Testing
make testpasses (not run; targeted affected suite passed)Additional validation:
.venv/bin/ruff check --fix ..venv/bin/ruff format ..venv/bin/ruff check ..venv/bin/ruff format --check .PYTHONDONTWRITEBYTECODE=1 .venv/bin/pytest -p no:cacheprovider packages/data-designer-config/tests/config/test_columns.py packages/data-designer-config/tests/config/test_skip_config.py packages/data-designer-engine/tests/engine/test_validation.py packages/data-designer-engine/tests/engine/column_generators/generators/test_custom.py packages/data-designer-engine/tests/engine/column_generators/generators/test_async_generators.py packages/data-designer-engine/tests/engine/dataset_builders/utils/test_dataset_batch_manager.py packages/data-designer-engine/tests/engine/dataset_builders/utils/test_skip_tracker.py packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py packages/data-designer/tests/interface/test_data_designer.py(368 passed)✅ Checklist